Unusual Features of the SARS-CoV-2 Genome Suggesting Sophisticated 
Laboratory Modification Rather Than Natural Evolution and Delineation 

of Its Probable Synthetic Route 


Li-Meng Yan (MD, PhD)', Shu Kang (PhD)', Jie Guan (PhD)', Shanchang Hu (PhD)' 


'Rule of Law Society & Rule of Law Foundation, New York, NY, USA. 


Correspondence: team.lmyan@gmail.com 


Abstract 

The COVID-19 pandemic caused by the novel coronavirus SARS-CoV-2 has led to over 910,000 deaths 
worldwide and unprecedented decimation of the global economy. Despite its tremendous impact, the 
origin of SARS-CoV-2 has remained mysterious and controversial. The natural origin theory, although 
widely accepted, lacks substantial support. The alternative theory that the virus may have come from a 
research laboratory is, however, strictly censored on peer-reviewed scientific journals. Nonetheless, 
SARS-CoV-2 shows biological characteristics that are inconsistent with a naturally occurring, zoonotic 
virus. In this report, we describe the genomic, structural, medical, and literature evidence, which, when 
considered together, strongly contradicts the natural origin theory. The evidence shows that SARS-CoV- 
2 should be a laboratory product created by using bat coronaviruses ZC45 and/or ZXC21 as a template 
and/or backbone. Building upon the evidence, we further postulate a synthetic route for SARS-CoV-2, 
demonstrating that the laboratory-creation of this coronavirus is convenient and can be accomplished in 
approximately six months. Our work emphasizes the need for an independent investigation into the 
relevant research laboratories. It also argues for a critical look into certain recently published data, which, 
albeit problematic, was used to support and claim a natural origin of SARS-CoV-2. From a public health 
perspective, these actions are necessary as knowledge of the origin of SARS-CoV-2 and of how the virus 
entered the human population are of pivotal importance in the fundamental control of the COVID-19 
pandemic as well as in preventing similar, future pandemics. 


1 



Introduction 

COVID-19 has caused a world-wide pandemie, the scale and severity of whieh are unpreeedented. 
Despite the tremendous efforts taken by the global eommunity, management and eontrol of this pandemie 
remains difficult and challenging. 

As a eoronavirus, SARS-CoV-2 differs signffieantly from other respiratory and/or zoonotie viruses: it 
attaeks multiple organs; it is eapable of undergoing a long period of asymptomatie infeetion; it is highly 
transmissible and significantly lethal in high-risk populations; it is well-adapted to humans sinee the very 
start of its emergenee^; it is highly efficient in binding the human ACE2 reeeptor (hACE2), the affinity of 
whieh is greater than that assoeiated with the ACE2 of any other potential hosE’^. 

The origin of SARS-CoV-2 is still the subjeet of mueh debate. A widely eited Nature Medicine 
publieation has elaimed that SARS-CoV-2 most likely came from nature"*. However, the artiele and its 
eentral eonclusion are now being ehallenged by seientists from all over the world^'*^. In addition, authors 
of this Nature Medicine article show signs of eonfiict of interests'®’*^, raising further coneerns on the 
eredibility of this publieation. 

The existing seientifie publieations supporting a natural origin theory rely heavily on a single pieee of 
evidenee - a previously diseovered bat eoronavirus named RaTGlS, whieh shares a 96% nueleotide 
sequenee identity with SARS-CoV-2'*. However, the existenee of RaTGlS in nature and the truthfulness 
of its reported sequenee are being widely questioned®'^”'^'^'. It is noteworthy that seientifie journals have 
elearly eensored any dissenting opinions that suggest a non-natural origin of SARS-CoV-2^’^^. Beeause of 
this eensorship, artieles questioning either the natural origin of SARS-CoV-2 or the aetual existenee of 
RaTGlS, although of high quality soientifieally, ean only exist as preprints®'^”'^'^' or other non-peer- 
reviewed artieles published on various online platforms'*'"'^’^^. Nonetheless, analyses of these reports have 
repeatedly pointed to severe problems and a probable fraud assoeiated with the reporting of RaTGlS®’*’^’*^' 

Therefore, the theory that fabrieated seientifie data has been published to mislead the world’s efforts 
in traeing the origin of SARS-CoV-2 has beeome substantially eonvineing and is interloeked with the 
notion that SARS-CoV-2 is of a non-natural origin. 

Consistent with this notion, genomie, struetural, and literature evidenee also suggest a non-natural 
origin of SARS-CoV-2. In addition, abundant literature indieates that gain-of-funetion researeh has long 
advanced to the stage where viral genomes ean be preeisely engineered and manipulated to enable the 
ereation of novel eoronaviruses possessing unique properties. In this report, we present sueh evidenee and 
the assoeiated analyses. Part 1 of the report deseribes the genomie and struetural features of SARS-CoV- 
2, the presenee of whieh eould be eonsistent with the theory that the virus is a produet of laboratory 
modifieation beyond what eould be afforded by simple serial viral passage. Part 2 of the report deseribes 
a highly probable pathway for the laboratory creation of SARS-CoV-2, key steps of whieh are supported 
by evidenee present in the viral genome. Importantly, part 2 should be viewed as a demonstration of how 
SARS-CoV-2 eould be eonveniently ereated in a laboratory in a short period of time using available 
materials and well-doeumented teehniques. This report is produeed by a team of experieneed seientists 
using our eombined expertise in virology, moleeular biology, struetural biology, eomputational biology, 
vaeoine development, and medieine. 
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1, Has SARS-CoV-2 been subjected to in vitro manipulation? 

We present three lines of evidence to support our contention that laboratory manipulation is part of the 
history of SARS-CoV-2; 

i. The genomic sequence of SARS-CoV-2 is suspiciously similar to that of a bat coronavirus 
discovered by military laboratories in the Third Military Medical University (Chongqing, China) 
and the Research Institute for Medicine of Nanjing Command (Nanjing, China). 

ii. The receptor-binding motif (RBM) within the Spike protein of SARS-CoV-2, which determines 
the host specificity of the virus, resembles that of SARS-CoV from the 2003 epidemic in a 
suspicious manner. Genomic evidence suggests that the RBM has been genetically manipulated. 

iii. SARS-CoV-2 contains a unique furin-cleavage site in its Spike protein, which is known to greatly 
enhance viral infectivity and cell tropism. Yet, this cleavage site is completely absent in this 
particular class of coronaviruses found in nature. In addition, rare codons associated with this 
additional sequence suggest the strong possibility that this furin-cleavage site is not the product of 
natural evolution and could have been inserted into the SARS-CoV-2 genome artificially by 
techniques other than simple serial passage or multi-strain recombination events inside co-infected 
tissue cultures or animals. 


1,1 Genomic sequence analysis reveals that ZC45, or a closely related bat coronavirus, should be 
the backbone used for the creation of SARS-CoV-2 


The structure of the -30,000 nucleotides-long SARS-CoV-2 genome is shown in Figure 1. Searching 
the NCBI sequence database reveals that, among all known coronaviruses, there were two related bat 
coronaviruses, ZC45 and ZXC21, that share the highest sequence identity with SARS-CoV-2 (each bat 
coronavirus is -89% identical to SARS-CoV-2 on the nucleotide level). Similarity between the genome 
of SARS-CoV-2 and those of representative P coronaviruses is depicted in Figure 1. ZXC21, which is 97% 
identical to and shares a very similar profile with ZC45, is not shown. Note that the RaTG13 virus is 
excluded from this analysis given the strong evidence suggesting that its sequence may have been 
fabricated and the virus does not exist in nature^’^'^. (A follow-up report, which summarizes the up-to-date 
evidence proving the spurious nature of RaTG13, will be submitted soon) 



Genome Nucleotide Position 
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Figure 1. Genomic sequence analysis reveals that bat coronavirus ZC45 is the closest match to SARS-CoV-2. 

Top: genomic organization of SARS-CoV-2 (2019-nCoV WIV04). Bottom: similarity plot based on the full-length 
genome of2019-nCoV W1VQ4. Full-length genomes of SARS-CoVBJOl, bat SARSr-CoV WlVl, bat SARSr-CoV 
HKU3-1, bat coronavirus ZC45 were used as reference sequences. 

When SARS-CoV-2 and ZC45/ZXC21 are compared on the amino acid level, a high sequence identity 
is observed for most of the proteins. The Nucleocapsid protein is 94% identical. The Membrane protein 
is 98.6% identical. The S2 portion (2nd half) of the Spike protein is 95% identical. Importantly, the Orf8 
protein is 94.2% identical and the E protein is 100% identical. 

Orf8 is an accessory protein, the function of which is largely unknown in most coronaviruses, although 
recent data suggests that Orf8 of SARS-CoV-2 mediates the evasion of host adaptive immunity by 
downregulating Normally, Orf8 is poorly conserved in coronaviruses^^. Sequence blast 

indicates that, while the Orf8 proteins of ZC45/ZXC21 share a 94.2% identity with SARS-CoV-2 Orf8, 
no other coronaviruses share more than 58% identity with SARS-CoV-2 on this particular protein . The 

very high homology here on the normally poorly conserved Orf8 protein is highly unusual. 
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Figure 2. Sequence alignment of the E proteins from different ft coronaviruses demonstrates the E protein’s 
permissiveness and tendency toward amino acid mutations. A. Mutations have been observed in different strains 
of SARS-CoV. GenBank accession numbers: SARS GDOl: AY278489.2, SARSjExoNl: ACB69908.1, 
SARS TW GDl: AY451881.1, SARS_Sinol_l 1: AY485277.1. B. Alignment of E proteins from related bat 
coronaviruses indicates its tolerance of mutations at multiple positions. GenBank accession numbers: 
Bat_AP040581.1: APO40581.1, RsSHCOM: KC881005.1, SC2018: MK211374.1, Bat_NP_828854.1 : 
NP_828854.1, BtRs-BetaCoV/HuB2013: A1A62312.1, BM48-31/BGRJ2008: YP 003858586.1. C. While the early 
copies of SARS-Co V-2 share 100% identity on the Eprotein with ZC45 and ZXC21, sequencing data of SARS-Co V- 
2 from April 2020 indicates that mutation has occurred at multiple positions. Accession numbers of viruses: Feb_ll: 
MN997409, ZC45: MG772933.1, ZXC21: MG772934, Apr_13: MT326139, Apr_15_A: MT263389, Apr_15_B: 
MT293206, Apr_17: MT350246. Alignments were done using the MultAlin Webserver 

{http://multalin. toulouse. inra.fr/multalin/) . 
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The coronavirus E protein is a struetural protein, whieh is embedded in and lines the interior of the 
membrane envelope of the virion^^. The E protein is tolerant of mutations as evideneed in both SARS 
(Eigure 2A) and related bat eoronaviruses (Eigure 2B). This toleranee to amino aeid mutations of the E 
protein is further evideneed in the eurrent SARS-CoV-2 pandemie. After only a short two-month spread 
of the virus sinee its outbreak in humans, the E proteins in SARS-CoV-2 have already undergone 
mutational ehanges. Sequenee data obtained during the month of April reveals that mutations have 
oeeurred at four different loeations in different strains (Eigure 2C). Consistent with this finding, sequenee 
blast analysis indieates that, with the exeeption of SARS-CoV-2, no known eoronaviruses share 100% 
amino aeid sequenee identity on the E protein with ZC45/ZXC21 {suspicious eoronaviruses published 
after the start of the current pandemic are excluded^^’^^'^^). Although 100% identity on the E protein has 
been observed between SARS-CoV and eertain SARS-related bat eoronaviruses, none of those pairs 
simultaneously share over 83% identity on the Orf8 protein^^. Therefore, the 94.2% identity on the Orf8 
protein, 100% identity on the E protein, and the overall genomic/amino acid-level resemblance between 
SARS-CoV-2 and ZC45/ZXC21 are highly unusual. Such evidence, when considered together, is 
consistent with a hypothesis that the SARS-CoV-2 genome has an origin based on the use of ZC45/ZXC21 
as a backbone and/or template for genetic gain-of-function modifications. 

Importantly, ZC45 and ZXC21 are bat eoronaviruses that were discovered (between July 2015 and 
Eebruary 2017), isolated, and characterized by military research laboratories in the Third Military Medical 
University (Chongqing. China) and the Research Institute for Medicine of Nanjing Command (Nanjing, 
China). The data and associated work were published in 2018^^’^"'. Clearly, this backbone/template, which 
is essential for the creation of SARS-CoV-2, exists in these and other related research laboratories. 

What strengthens our contention further is the published RaTG13 virus'^, the genomic sequence of 
which is reportedly 96% identical to that of SARS-CoV-2. While suggesting a natural origin of SARS- 
CoV-2, the RaTG13 virus also diverted the attention of both the scientific field and the general public 
away from ZC45/ZXC2U’^^. In fact, a Chinese BSE-3 lab (the Shanghai Public Health Clinical Centre), 
which published a Nature article reporting a conflicting close phylogenetic relationship between SARS- 
CoV-2 and ZC45/ZXC21 rather than with RaTG13^^, was quickly shut down for “rectification”^^. It is 
believed that the researchers of that laboratory were being punished for having disclosed the SARS-CoV- 
2—ZC45/ZXC21 connection. On the other hand, substantial evidence has accumulated, pointing to severe 
problems associated with the reported sequence of RaTG13 as well as questioning the actual existence of 
this bat virus in nature^’^’^^'^f A very recent publication also indicated that the receptor-binding domain 
(RBD) of the RaTG13’s Spike protein could not bind ACE2 of two different types of horseshoe bats (they 
closely relate to the horseshoe batR. affinis, RaTG13’s alleged natural host)^, implicating the inability of 
RaTG13 to infect horseshoe bats. This finding further substantiates the suspicion that the reported 
sequence of RaTG13 could have been fabricated as the Spike protein encoded by this sequence does not 
seem to carry the claimed function. The fact that a virus has been fabricated to shift the attention away 
from ZC45/ZXC21 speaks for an actual role of ZC45/ZXC21 in the creation of SARS-CoV-2. 

1,2 The receptor-binding motif of SARS-CoV-2 Spike cannot be born from nature and should have 
been created through genetic engineering 

The Spike proteins decorate the exterior of the coronavirus particles. They play an important role in 
infection as they mediate the interaction with host cell receptors and thereby help determine the host range 
and tissue tropism of the virus. The Spike protein is split into two halves (Eigure 3). The front or N- 
terminal half is named S1, which is fully responsible for binding the host receptor. In both SARS-CoV 
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and SARS-CoV-2 infections, the host cell receptor is hACE2. Within SI, a segment of around 70 amino 
acids makes direct contacts with hACE2 and is correspondingly named the receptor-binding motif (RBM) 
(Eigure 3C). In SARS-CoV and SARS-CoV-2, the RBM fully determines the interaction with hACE2. 
The C-terminal half of the Spike protein is named S2. The main function of S2 includes maintaining trimer 
formation and, upon successive protease cleavages at the S1/S2 junction and a downstream S2’ position, 
mediating membrane fusion to enable cellular entry of the virus. 



Figure 3. Structure of the SARS Spike protein and how it binds to the hACE2 receptor. Pictures were generated 
based on PDB ID: 6acf^. A) Three spike proteins, each consisting of a SI half and a S2 half, form a trimer. B) The 
S2 halves (shades of blue) are responsible for trimer formation, while the SI portion (shades of red) is responsible 
for binding hACE2 (dark gray). C) Details of the binding between SI and hACE2. The RBM of SI, which is 
important and sufficient for binding, is colored in orange. Residues within the RBM that are important for either 
hACE2 interaction or protein folding are shown as sticks (residue numbers follow the SARS Spike sequence). 
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Figure 4. Sequence alignment of the spike proteins from relevant coronaviruses. Viruses being compared include 
SARS-CoV-2 (Wuhan-Hu-1:NC_045512, 2019-nCoV_USA-AZl: MN997409), bat coronaviruses (Bat_CoV_ZC45: 
MG772933, Bat_CoV_ZXC21: MG772934), and SARS coronaviruses (SARS_GZ02: AY390556, SARS: 
NC_004718.3). Region marked by two orange lines is the receptor-binding motif (RBM), which is important for 
interaction with the hACE2 receptor. Essential residues are additionally highlighted by red sticks on top. Region 
marked by two green lines is a furin-cleavage site that exists only in SARS-Co V-2 but not in any other lineage B fl 
coronavirus. 
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Similar to what is observed for other viral proteins, S2 of SARS-CoV-2 shares a high sequence identity 
(95%) with S2 of ZC45/ZXC21. In stark contrast, between SARS-CoV-2 and ZC45/ZXC21, the SI 
protein, which dictates which host (human or bat) the virus can infect, is much less conserved with the 
amino acid sequence identity being only 69%. 

Figure 4 shows the sequence alignment of the Spike proteins from six P coronaviruses. Two are viruses 
isolated from the current pandemic (Wuhan-Hu-1, 2019-nCoV_USA-AZl); two are the suspected 
template viruses (Bat_CoV_ZC45, Bat_CoV_ZXC21); two are SARS coronaviruses (SARS_GZ02, 
SARS). The RBM is highlighted in between two orange lines. Clearly, despite the high sequence identity 
for the overall genomes, the RBM of SARS-CoV-2 differs significantly from those of ZC45 and ZXC21. 
Intriguingly, the RBM of SARS-CoV-2 resembles, on a great deal, the RBM of SARS Spike. Although 
this is not an exact “copy and paste”, careful examination of the Spike-hACE2 structures^^’^^ reveals that 
all residues essential for either hACE2 binding or protein folding (orange sticks in Eigure 3C and what is 
highlighted by red short lines in Eigure 4) are “kept”. Most of these essential residues are precisely 
preserved, including those involved in disulfide bond formation (C467, C474) and electrostatic 
interactions (R444, E452, R453, D454), which are pivotal for the structural integrity of the RBM (Eigure 
3C and 4). The few changes within the group of essential residues are almost exclusively hydrophobic 
“substitutions” (I428-^E, E443-^E, E460-^Y, L472-^E, Y484^Q), which should not affect either 
protein folding or the hACE2-interaction. At the same time, majority of the amino acid residues that are 
non-essential have “mutated” (Eigure 4, RBM residues not labeled with short red lines). Judging from this 
sequence analysis alone, we were convinced early on that not only would the SARS-CoV-2 Spike protein 
bind hACE2 but also the binding would resemble, precisely, that between the original SARS Spike protein 
and hACE2^^. Recent structural work has confirmed our predictions^. 

As elaborated below, the way that SARS-CoV-2 RBM resembles SARS-CoV RBM and the overall 
sequence conservation pattern between SARS-CoV-2 and ZC45/ZXC21 are highly unusual. Collectively, 
this suggests that portions of the SARS-CoV-2 genome have not been derived from natural quasi-species 
viral particle evolution. 

If SARS-CoV-2 does indeed come from natural evolution, its RBM could have only been acquired in 
one of the two possible routes: 1) an ancient recombination event followed by convergent evolution or 2) 
a natural recombination event that occurred fairly recently. 

In the first scenario, the ancestor of SARS-CoV-2, a ZC45/ZXC2I-like bat coronavirus would have 
recombined and “swapped” its RBM with a coronavirus carrying a relatively “complete” RBM (in 
reference to SARS). This recombination would result in a novel ZC45/ZXC21-like coronavirus with all 
the gaps in its RBM “fdled” (Eigure 4). Subsequently, the virus would have to adapt extensively in its new 
host, where the ACE2 protein is highly homologous to hACE2. Random mutations across the genome 
would have to have occurred to eventually shape the RBM to its current form - resembling SARS-CoV 
RBM in a highly intelligent manner. However, this convergent evolution process would also result in the 
accumulation of a large amount of mutations in other parts of the genome, rendering the overall sequence 
identity relatively low. The high sequence identity between SARS-CoV-2 and ZC45/ZXC21 on various 
proteins (94-100% identity) do not support this scenario and, therefore, clearly indicates that SARS-CoV- 
2 carrying such an RBM cannot come from a ZC45/ZXC21-like bat coronavirus through this convergent 
evolutionary route. 

In the second scenario, the ZC45/ZXC2I-like coronavirus would have to have recently recombined 
and swapped its RBM with another coronavirus that had successfully adapted to bind an animal ACE2 
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highly homologous to hACE2. The likelihood of sueh an event depends, in part, on the general 
requirements of natural recombination: 1) that the two different viruses share significant sequence 
similarity; 2) that they must co-infect and be present in the same cell of the same animal; 3) that the 
recombinant virus would not be cleared by the host or make the host extinct; 4) that the recombinant virus 
eventually would have to become stable and transmissible within the host species. 

In regard to this recent recombination scenario, the animal reservoir could not be bats because the 
ACE2 proteins in bats are not homologous enough to hACE2 and therefore the adaption would not be able 
to yield an RBM sequence as seen in SARS-CoV-2. This animal reservoir also could not be humans as 
the ZC45/ZXC21-ltke coronavirus would not be able to infect humans. In addition, there has been no 
evidence of any SARS-CoV-2 or SARS-CoV-2-ltke virus circulating in the human population prior to late 
2019. Intriguingly, according to a recent bioinformatics study, SARS-CoV-2 was well-adapted for humans 
since the start of the outbreak'. 

Only one other possibility of natural evolution remains, which is that the ZC45/ZXC2I-like virus and 
a coronavirus containing a SARS-like RBM could have recombined in an intermediate host where the 
ACE2 protein is homologous to hACE2. Several laboratories have reported that some of the Sunda 
pangolins smuggled into China from Malaysia carried coronaviruses, the receptor-binding domain (RBD) 
of which is almost identical to that of SARS-CoV-2^^'^^’^'. They then went on to suggest that pangolins 
are the likely intermediate host for SARS-CoV-2^^'^^’^*. However, recent independent reports have found 
significant flaws in this data"'°'"'^. Eurthermore, contrary to these reports^^'^^’^', no coronaviruses have been 
detected in Sunda pangolin samples collected for over a decade in Malaysia and Sabah between 2009 and 
2019"*^. A recent study also showed that the RBD, which is shared between SARS-CoV-2 and the reported 
pangolin coronaviruses, binds to hACE2 ten times stronger than to the pangolin ACE2^, further dismissing 
pangolins as the possible intermediate host. Einally, an in silico study, while echoing the notion that 
pangolins are not likely an intermediate host, also indicated that none of the animal ACE2 proteins 
examined in their study exhibited more favorable binding potential to the SARS-CoV-2 Spike protein than 
hACE2 did^. This last study virtually exempted all animals from their suspected roles as an intermediate 
host^, which is consistent with the observation that SARS-CoV-2 was well-adapted for humans from the 
start of the outbreak'. This is significant because these findings collectively suggest that no intermediate 
host seems to exist for SARS-CoV-2, which at the very least diminishes the possibility of a recombinant 
event occurring in an intermediate host. 

Even if we ignore the above evidence that no proper host exists for the recombination to take place and 
instead assume that such a host does exist, it is still highly unlikely that such a recombination event could 
occur in nature. 

As we have described above, if natural recombination event is responsible for the appearance of SARS- 
CoV-2, then the ZC45/ZXC2I-like virus and a coronavirus containing a SARS-like RBM would have to 
recombine in the same cell by swapping the SI/RBM, which is a rare form of recombination. Eurthermore, 
since SARS has occurred only once in human history, it would be at least equally rare for nature to produce 
a virus that resembles SARS in such an intelligent manner - having an RBM that differs from the SARS 
RBM only at a few non-essential sites (Eigure 4). The possibility that this unique SARS-like coronavirus 
would reside in the same cell with the ZC45/ZXC2I-like ancestor virus and the two viruses would 
recombine in the “RBM-swapping” fashion is extremely low. Importantly, this, and the other 
recombination event described below in section 1.3 (even more impossible to occur in nature), would both 
have to happen to produce a Spike as seen in SARS-CoV-2. 
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While the above evidence and analyses together appear to disapprove a natural origin of SARS-CoV- 
2’s RBM, abundant literature shows that gain-of-function research, where the Spike protein of a 
coronavirus was specifically engineered, has repeatedly led to the successful generation of human- 
infecting coronaviruses from coronaviruses of non-human origin'*'^'"^^. 

Record also shows that research laboratories, for example, the Wuhan Institute of Virology (WIV), 
have successfully carried out such studies working with US researchers'*^ and also working alone'*^. In 
addition, the WIV has engaged in decades-long coronavirus surveillance studies and therefore owns the 
world’s largest collection of coronaviruses. Evidently, the technical barrier is non-existent for the WIV 
and other related laboratories to carry out and succeed in such Spike/RBM engineering and gain-of- 
fimction research. 
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Figure 5. Two restriction sites are present at either end of the RBM of SARS-CoV-2, providing convenience for 
replacing the RBM within the spike gene. A. Nucleotide sequence of the RBM of SARS-Co V-2 (Wuhan-Hu-1). An 
EcoRI site is found at the 5 ’-end of the RBM and a BstEII site at the 3 ’-end. B. Although these two restriction sites 
do not exist in the original spike gene of ZC45, they can be conveniently introduced given that the sequence 
discrepancy is small (2 nucleotides) in either case. C. Amino acid sequence alignment with the RBM region 
highlighted (color and underscore). The RBM highlighted in orange (top) is what is defined by the EcoRI and BstEII 
sites in the SARS-CoV-2 (Wuhan-Hu-I) spike. The RBM highlighted in magenta (middle) is the region swapped by 
Dr. Fang Li and colleagues into a BARS Spike backbone^^. The RBM highlighted in blue (bottom) is from the Spike 
protein (RBM: 424-494) of SARS-BJOI (AY278488.2), which was swapped by the Shi lab into the Spike proteins of 
different bat coronaviruses replacing the corresponding segments'^^. 
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Strikingly, consistent with the RBM engineering theory, we have identified two unique restrietion sites, 
EeoRI and BstEII, at either end of the RBMoi the SARS-CoV-2 genome, respeetively (Eigure 5A). These 
two sites, whieh are popular ehoiees of everyday moleeular eloning, do not exist in the rest of this spike 
gene. This partieular setting makes it extremely eonvenient to swap the RBM within spike, providing a 
quiek way to test different RBMs and the eorresponding Spike proteins. 

Sueh EeoRI and BstEII sites do not exist in the spike genes of other P eoronaviruses, whieh strongly 
indieates that they were unnatural and were speeifieally introdueed into this spike gene of SARS-CoV-2 
for the eonvenienee of manipulating the eritieal RBM. Although ZC45 spike also does not have these two 
sites (Eigure 5B), they ean be introduced very easily as deseribed in part 2 of this report. 

It is noteworthy that introduetion of the EeoRI site here would ehange the eorresponding amino aeids 
from -WNT- to -WNS- (Eigure 5AB). As far as we know, all SARS and SARS-like bat eoronaviruses 
exclusively earry a T (threonine) residue at this loeation. SARS-CoV-2 is the only exeeption in that this T 
has mutated to an S (serine), save the suspieious RaTGlS and pangolin eoronaviruses published after the 
outbreak"^*. 

Onee the restrietion sites were sueeessfully introdueed, the RBM segment eould be swapped 
eonveniently using routine restrietion enzyme digestion and ligation. Although alternative cloning 
techniques may leave no traee of genetie manipulation (Gibson assembly as one example), this old- 
fashioned approaeh eould be ehosen beeause it offers a great level of eonvenienee in swapping this eritieal 
RBM. 

Given that RBM fully dietates hACE2-binding and that the SARS RBM-hACE2 binding was fully 
eharaeterized by high-resolution struetures (Eigure 3)^^’^*, this RBM-only swap would not be any riskier 
than the full Spike swap. In faet, the feasibility of this RBM-swap strategy has been proven^^’"*^. In 2008, 
Dr. Zhengli Shi’s group swapped a SARS RBM into the Spike proteins of several SARS-like bat 
eoronaviruses after introdueing a restrietion site into a eodon-optimized spike gene (Eigure SC)"*^. They 
then validated the binding of the resulted ehimerie Spike proteins with hACE2. Eurthermore, in a reeent 
publieation, the RBM of SARS-CoV-2 was swapped into the reeeptor-binding domain (RBD) of SARS- 
CoV, resulting in a ehimerie RBD fully functional in binding hACE2 (Eigure 5C)^^. Strikingly, in both 
cases, the manipulated RBM segments resemble almost exactly the RBM defined by the positions of the 
EeoRI and BstEII sites (Eigure 5C). Although cloning details are laeking in both publications^^’"'^, it is 
eonceivable that the aetual restriction sites may vary depending on the spike gene reeeiving the RBM 
insertion as well as the eonvenienee in introdueing unique restrietion site(s) in regions of interest. It is 
noteworthy that the eorresponding author of this reeent publication^^. Dr. Eang Ei, has been an aetive 
eollaborator of Dr. Zhengli Shi sinee 2010"^^'^^. Dr. Ei was the first person in the world to have strueturally 
elueidated the binding between SARS-CoV RBD and hACE2^^ and has been the leading expert in the 
struetural understanding of Spike-ACE2 interaetions^*’^^’^^'^^. The striking finding of EeoRI and BstEII 
restrietion sites at either end of the SARS-CoV-2 RBM, respeetively, and the faet that the same RBM 
region has been swapped both by Dr. Shi and by her long-term eollaborator, respectively, using restriction 
enzyme digestion methods are unlikely a eoineidenee. Rather, it is the smoking gun proving that the 
RBM/Spike of SARS-CoV-2 is a product of genetic manipulation. 

Although it may be eonvenient to eopy the exact sequence of SARS RBM, it would be too clear a sign 
of artifieial design and manipulation. The more deeeiving approach would be to ehange a few non- 
essential residues, while preserving the ones eritieal for binding. This design eould be well-guided by the 
high-resolution struetures (Eigure 3)^^’^*. This way, when the overall sequenee of the RBM would appear 
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to be more distinct from that of the SARS RBM, the hACE2-binding ability would be well-preserved. We 
believe that all of the crucial residues (residues labeled with red sticks in Figure 4, which are the same 
residues shown in sticks in Figure 3C) should have been “kept”. As described earlier, while some should 
be direct preservation, some should have been switched to residues with similar properties, which would 
not disrupt hACE2-binding and may even strengthen the association further. Importantly, changes might 
have been made intentionally at non-essential sites, making it less like a “copy and paste” of the SARS 
RBM. 

1,3 An unusual furin-cleavage site is present in the Spike protein of SARS-CoV-2 and is associated 
with the augmented virulence of the virus 

Another unique motif in the Spike protein of SARS-CoV-2 is a polybasic furin-cleavage site located at 
the S1/S2 junction (Figure 4, segment in between two green lines). Such a site can be recognized and 
cleaved by the furin protease. Within the lineage B of P coronaviruses and with the exception of SARS- 
CoV-2, no viruses contain a furin-cleavage site at the S1/S2 junction (Figure 6)^^. In contrast, furin- 
cleavage site at this location has been observed in other groups of coronaviruses^^’^^. Certain selective 
pressure seems to be in place that prevents the lineage B of P coronaviruses from acquiring or maintaining 
such a site in nature. 
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Figure 6. Furin-cleavage site found at the S1/S2 junction of Spike is unique to SARS-CoV-2 and absent in other 
lineage B p coronaviruses. Figure reproduced from Hoffmann, et af^. 
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As previously described, during the cell entry process, the Spike protein is first cleaved at the S1/S2 
junction. This step, and a subsequent cleavage downstream that exposes the fusion peptide, are both 
mediated by host proteases. The presence or absence of these proteases in different cell types greatly 
affects the cell tropism and presumably the pathogenicity of the viral infection. Unlike other proteases, 
fiirin protease is widely expressed in many types of cells and is present at multiple cellular and 
extracellular locations. Importantly, the introduction of a furin-cleavage site at the S1/S2 junction could 
significantly enhance the infectivity of a virus as well as greatly expand its cell tropism — a phenomenon 
well-documented in both influenza viruses and other coronaviruses^^'®^. 

If we leave aside the fact that no furin-cleavage site is found in any lineage B P coronavirus in nature 
and instead assume that this site in SARS-CoV-2 is a result of natural evolution, then only one 
evolutionary pathway is possible, which is that the furin-cleavage site has to be derived from a 
homologous recombination event. Specifically, an ancestor P coronavirus containing no furin-cleavage 
site would have to recombine with a closely related coronavirus that does contain a furin-cleavage site. 

However, two facts disfavor this possibility. First, although some coronaviruses from other groups or 
lineages do contain polybasic furin-cleavage sites, none of them contains the exact polybasic sequence 
present in SARS-CoV-2 {-PRRAR/SVA-). Second, between SARS-CoV-2 and any coronavirus containing 
a legitimate furin-cleavage site, the sequence identity on Spike is no more than 40%®^. Such a low level 
of sequence identity rules out the possibility of a successful homologous recombination ever occurring 
between the ancestors of these viruses. Therefore, the furin-cleavage site within the SARS-CoV-2 Spike 
protein is unlikely to be of natural origin and instead should be a result of laboratory modification. 

Consistent with this claim, a close examination of the nucleotide sequence of the furin-cleavage site in 
SARS-CoV-2 spike has revealed that the two consecutive Arg residues within the inserted sequence (- 
PRRA-I are both coded by the rare codon CGG (least used codon for Arg in SARS-CoV-2) (Figure 7)^. 
In fact, this CGGCGG arrangement is the only instance found in the SARS-CoV-2 genome where this 
rare codon is used in tandem. This observation strongly suggests that this furin-cleavage site should be a 
result of genetic engineering. Adding to the suspicion, a Paul restriction site is formulated by the codon 
choices here, suggesting the possibility that the restriction fragment length polymorphism, a technique 
that a WIV lab is proficient at^^, could have been involved. There, the fragmentation pattern resulted from 
Paul digestion could be used to monitor the preservation of the furin-cleavage site in Spike as this furin- 
cleavage site is prone to deletions in vitro^^’^^. Specifically, RT-PCR on the spike gene of the recovered 
viruses from cell cultures or laboratory animals could be carried out, the product of which would be 
subjected to Paul digestion. Viruses retaining or losing the furin-cleavage site would then yield distinct 
patterns, allowing convenient tracking of the virus(es) of interest. 

Paul 

tat cag act cag act aat tct cct egg egg gea cgt agt gta get agt caa tcc ate att 
YQTQTNSPRRARSVASQSII 

Figure 7. Two consecutive Arg residues in the -PRRA- insertion at the S1/S2 junction of SARS-Co V-2 Spike are 
both coded by a rare codon, CGG. A Paul restriction site, 5 ’-(N)6GCGGG-3 ’, is embedded in the coding sequence 
of the “inserted” PRRA segment, which may be used as a marker to monitor the preservation of the introduced 
furin-cleavage site. 

In addition, although no known coronaviruses contain the exact sequence of -PRRAR/SVA- that is 
present in the SARS-CoV-2 Spike protein, a similar -RRAR/AR- sequence has been observed at the S1/S2 
junction of the Spike protein in a rodent coronavirus, AcCoV-JC34, which was published by Dr. Zhengli 
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Shi in 2017™. It is evident that the legitimaey of -RRAR- as a funetional furin-eleavage site has been 
known to the WIV experts sinee 2017. 

The evidenee eolleetively suggests that the furin-eleavage site in the SARS-CoV-2 Spike protein may 
not have eome from nature and eould be the result of genetie manipulation. The purpose of this 
manipulation eould have been to assess any potential enhaneement of the infeetivity and pathogenieity of 
the laboratory-made eoronavirus^^'^"^. Indeed, reeent studies have eonfirmed that the furin-eleavage site 
does eonfer signifieant pathogenic advantages to SARS-CoV-2^^’^^. 

1,4 Summary 

Evidence presented in this part reveals that certain aspects of the SARS-CoV-2 genome are extremely 
difficult to reconcile to being a result of natural evolution. The alternative theory we suggest is that the 
virus may have been created by using ZC45/ZXC21 bat coronavirus(es) as the backbone and/or template. 
The Spike protein, especially the RBM within it, should have been artificially manipulated, upon which 
the virus has acquired the ability to bind hACE2 and infect humans. This is supported by the finding of a 
unique restriction enzyme digestion site at either end of the RBM. An unusual furin-eleavage site may 
have been introduced and inserted at the S1/S2 junction of the Spike protein, which contributes to the 
increased virulence and pathogenicity of the virus. These transformations have then staged the SARS- 
CoV-2 virus to eventually become a highly-transmissible, onset-hidden, lethal, sequelae-unclear, and 
massively disruptive pathogen. 

Evidently, the possibility that SARS-CoV-2 could have been created through gain-of-function 
manipulations at the WIV is significant and should be investigated thoroughly and independently. 


2, Delineation of a synthetic route of SARS-CoV-2 

In the second part of this report, we describe a synthetic route of creating SARS-CoV-2 in a laboratory 
setting. It is postulated based on substantial literature support as well as genetic evidence present in the 
SARS-CoV-2 genome. Although steps presented herein should not be viewed as exactly those taken, we 
believe that key processes should not be much different. Importantly, our work here should serve as a 
demonstration of how SARS-CoV-2 can be designed and created conveniently in research laboratories by 
following proven concepts and using well-established techniques. 

Importantly, research labs, both in Hong Kong and in mainland China, are leading the world in 
coronavirus research, both in terms of resources and on the research outputs. The latter is evidenced not 
only by the large number of publications that they have produced over the past two decades but also by 
their milestone achievements in the field: they were the first to identify civets as the intermediate host for 
SARS-CoV and isolated the first strain of the virus^'; they were the first to uncover that SARS-CoV 
originated from bats^^’^^; they revealed for the first time the antibody-dependent enhancement (ADE) of 
SARS-CoV infections^"^; they have contributed significantly in understanding MERS in all domains 
(zoonosis, virology, and clinical studies)^^'^^; they made several breakthroughs in SARS-CoV-2 
research^^’^^’^*’. East but not least, they have the world’s largest collection of coronaviruses (genomic 
sequences and live viruses). The knowledge, expertise, and resources are all readily available within the 
Hong Kong and mainland research laboratories (they collaborate extensively) to carry out and accomplish 
the work described below. 
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Figure 8. Diagram of a possibie synthetic route of the iaboratory-creation of SARS-CoV-2. 
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2,1 Possible scheme in designing the laboratory-creation of the novel coronavirus 

In this sub-section, we outline the possible overall strategy and major considerations that may have 
been formulated at the designing stage of the project. 

To engineer and create a human-targeting coronavirus, they would have to pick a bat coronavirus as 
the template/backbone . This can be conveniently done because many research labs have been actively 
collecting bat coronaviruses over the past two decades^^’^^’^*’’^^’^^'^^. However, this template virus ideally 
should not be one from Dr. Zhengli Shi’s collections, considering that she is widely known to have been 
engaged in gain-of-function studies on coronaviruses. Therefore, ZC45 and/or ZXC21, novel bat 
coronaviruses discovered and owned by military laboratories^^, would be suitable as the 
template/backbone. It is also possible that these military laboratories had discovered other closely related 
viruses from the same location and kept some unpublished. Therefore, the actual template could be ZC45, 
or ZXC21, or a close relative of them. The postulated pathway described below would be the same 
regardless of which one of the three was the actual template. 

Once they have chosen a template virus, they would first need to engineer, through molecular cloning, 
the Spike protein so that it can bind hACE2 . The concept and cloning techniques involved in this 
manipulation have been well-documented in the literature"^"^'"^^’^"^’^^. With almost no risk of failing, the 
template bat virus could then be converted to a coronavirus that can bind hACE2 and infect humans"^"^'"*^. 

Second, they would use molecular cloning to introduce a furin-cleavage site at the S1/S2 junction of 
Spike . This manipulation, based on known knowledge^®’^'’^^, would likely produce a strain of coronavirus 
that is a more infectious and pathogenic. 

Third, they would produce an ORFlb gene construct . The ORFlb gene encodes the polyprotein Orflb, 
which is processed post-translationally to produce individual viral proteins: RNA-dependent RNA 
polymerase (RdRp), helicase, guanidine-N7 methyltransferase, uridylate-specific endoribonuclease, and 
2’-0-methyltransferase. All of these proteins are parts of the replication machinery of the virus. Among 
them, the RdRp protein is the most crucial one and is highly conserved among coronaviruses. Importantly, 
Dr. Zhengli Shi’s laboratory uses a PCR protocol, which amplifies a particular fragment of the RdRp gene, 
as their primary method to detect the presence of coronaviruses in raw samples (bat fecal swap, feces, etc). 
As a result of this practice, the Shi group has documented the sequence information of this short segment 
of RdRp for all coronaviruses that they have successfully detected and/or collected. 

Here, the genetic manipulation is less demanding or complicated because Orflb is conserved and likely 
Orflb from any P coronavirus would be competent enough to do the work. However, we believe that they 
would want to introduce a particular Orflb into the virus for one of the two possible reasons: 

1. Since many phylogenetic analyses categorize coronaviruses based on the sequence similarity of 
the RdRp gene having a different RdRp in the genome therefore could ensure that 

SARS-CoV-2 and ZC45/ZXC21 are separated into different groups/sub-lineages in phylogenetic 
studies. Choosing an RdRp gene, however, is convenient because the short RdRp segment sequence 
has been recorded for all coronaviruses ever collected/detected. Their final choice was the RdRp 
sequence from bat coronavirus RaBtCoV/4991, which was discovered in 2013. For 
RaBtCoV/4991, the only information ever published was the sequence of its short RdRp segment^^, 
while neither its full genomic sequence nor virus isolation were ever reported. After amplifying 
the RdRp segment (or the whole ORFlb gene) of RaBatCoV/4991, they would have then used it 
for subsequent assembly and creation of the genome of SARS-CoV-2. Small changes in the RdRp 
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sequence could either be introduced at the beginning (through DNA synthesis) or be generated via 
passages later on. On a separate track, when they were engaged in the fabrication of the RaTG13 
sequence, they could have started with the short RdRp segment of RaBtCoV/4991 without 
introducing any changes to its sequence, resulting in a 100% nucleotide sequence identity between 
the two viruses on this short RdRp segment^^. This RaTG13 virus could then be claimed to have 
been discovered back in 2013. 

2. The RdRp protein from RaBatCoV/4991 is unique in that it is superior than RdRp from any other 
P coronavirus for developing antiviral drugs. RdRp has no homologs in human cells, which makes 
this essential viral enzyme a highly desirable target for antiviral development. As an example, 
Remedesivir, which is currently undergoing clinical trials, targets RdRp. When creating a novel 
and human-targeting virus, they would be interested in developing the antidote as well. Even 
though drug discovery like this may not be easily achieved, it is reasonable for them to 
intentionally incorporate a RdRp that is more amenable for antiviral drug development. 

Fourth, they would use reverse genetics to assemble the gene fragments of spike, ORFlb, and the rest 
of the template ZC45 into a cDNA version of the viral genome. They would then carry out in vitro 
transcription to obtain the viral RNA genome. Transfection of the RNA genome into cells would allow 
the recovery of live and infectious viruses with the desired artificial genome. 

Fifth, they would carry out characterization and optimization of the virus strain!s) to improve the fitness, 
infectivity, and overall adaptation using serial passage in vivo. One or several viral strains that meet certain 
criteria would then be obtained as the final product(s). 

2.2 A postulated synthetic route for the creation of SARS-CoV-2 

In this sub-section, we describe in more details how each step could be carried out in a laboratory 
setting using available materials and routine molecular, cellular, and virologic techniques. A diagram of 
this process is shown in Figure 8. We estimate that the whole process could be completed in approximately 
6 months. 

Step 1: Engineering the RBM of the Spike for hACE2-binding (1.5 months) 

The Spike protein of a bat coronavirus is either incapable of or inefficient in binding hACE2 due to the 
missing of important residues within its RBM. This can be exemplified by the RBM of the template virus 
ZC45 (Figure 4). The first and most critical step in the creation of SARS-CoV-2 is to engineer the Spike 
so that it acquires the ability to bind hACE2. As evidenced in the literature, such manipulations have been 
carried out repeatedly in research laboratories since 2008"^"*, which successfully yielded engineered 
coronaviruses with the ability to infect human cells"^"^'"^^’^*’*^. Although there are many possible ways that 
one can engineer the Spike protein, we believe that what was actually undertaken was that they replaced 
the original RBM with a designed and possibly optimized RBM using SARS’ RBM as a guide. As 
described in part 1, this theory is supported by our observation that two unique restriction sites, EcoRI and 
BstEII, exist at either end of the RBM 'm the SARS-CoV-2 genome (Figure 5A) and by the fact that such 
RBM-swap has been successfully carried out by Dr. Zhengli Shi and by her long-term collaborator and 
structure biology expert. Dr. Fang Fi^^’"*^. 

Although ZC45 spike does not contain these two restriction sites (Figure 5B), they can be introduced 
very easily. The original spike gene would be either amplified with RT-PCR or obtained through DNA 
synthesis (some changes could be safely introduced to certain variable regions of the sequence) followed 
by PCR. The gene would then be cloned into a plasmid using restriction sites other than EcoRI and BstEII. 
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Once in the plasmid, the spike gene ean be modified easily. First, an EeoRI site ean be introdueed by 
eonverting the highlighted “gaaeac” sequenee (Figure 5B) to the desired “gaattc” (Figure 5A). The 
differenee between them are two eonseeutive nueleotides. Using the eommereially available QuikChange 
Site-Direeted Mutagenesis kit, such a di-nucleotide mutation ean be generated in no more than one week. 
Subsequently, the BstEII site eould be similarly introdueed at the other end of the RBM. Speeifieally, the 
“gaataee” sequenee (Eigure 5B) would be eonverted to the desired “ggttaee” (Eigure 5A), whieh would 
similarly require a week of time. 

Onee these restrietion sites, whieh are unique within the spike gene of SARS-CoV-2, were sueeessfully 
introdueed, different RBM segments eould be swapped in conveniently and the resulting Spike protein 
subsequently evaluated using established assays. 

As described in part 1, the design of an RBM segment could be well-guided by the high-resolution 
struetures (Eigure yielding a sequenee that resembles the SARS RBM in an intelligent manner. 

When carrying out the strueture-guided design of the RBM, they would have followed the routine and 
generated a few (for example a dozen) such RBMs with the hope that some speeific variant(s) may be 
superior than others in binding hACE2. Once the design was finished, they could have eaeh of the designed 
RSM genes eommereially synthesized (quiek and very affordable) with an EcoRI site at the 5’-end and a 
BstEII site at the 3’-end. These novel RSM genes could then be eloned into the spike gene, respeetively. 
The gene synthesis and subsequent eloning, whieh eould be done in a bateh mode for the small library of 
designed RBMs, would take approximately one month. 

These engineered Spike proteins might then be tested for hACE2-binding using the established 
pseudotype virus infeetion assays"^^-'^^’^'’. The engineered Spike with good to exeeptional binding affinities 
would be selected. (Although not neeessary, directed evolution eould be involved here (error-prone PCR 
on the RBM gene), eoupled with either an in vitro binding assay^^’^° or a pseudotype virus infeetion 
assayto obtain an RBM that binds hACE2 with exceptional affinity.) 

Given the abundanee of literature on Spike engineering"*"^'"^^’^"^’^® and the available high-resolution 
struetures of the Spike-hACE2 eomplex^^’^^, the suecess of this step would be very much guaranteed. By 
the end of this step, as desired, a novel spike gene would be obtained, whieh encodes a novel Spike protein 
capable of binding hACE2 with high affinity. 

Step 2: Engineering a furin-eleavage site at the S1/S2 junetion (0.5 month) 

The product from Step 1, a plasmid eontaining the engineered spike, would be further modified to 
inelude a furin-cleavage site (segment indieated by green lines in Eigure 4) at the S1/S2 junction. This 
short stretch of gene sequenee ean be eonveniently inserted using several routine eloning teehniques, 
ineluding QuikChange Site-Direeted PCR®*’, overlap PCR followed by restrietion enzyme digestion and 
ligation^^ or Gibson assembly. None of these teehniques would leave any traee in the sequenee. 
Whiehever eloning method was the ehoiee, the inserted gene pieee would be ineluded in the primers, 
whieh would be designed, synthesized, and used in the eloning. This step, leading to a further modified 
Spike with the furin-eleavage site added at the S1/S2 junetion, could be eompleted in no more than two 
weeks. 

Step 3: Obtain an ORFlb gene that eontains the sequenee of the short RdRp segment from RaBtCoV/4991 
(1 month, vet ean be carried out eoneurrently with Steps 1 and 2) 
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Unlike the engineering of Spike, no oomplieated design is needed here, exeept that the RdRp gene 
segment from RaBtCoV/4991 would need to be ineluded. Gibson assembly eould have been used here. In 
this teehnique, several fragments, eaeh adjaeent pair sharing 20-40 bp overlap, are eombined together in 
one simple reaetion to assemble a long DNA produet. Two or three fragments, eaeh eovering a signifieant 
seetion of the ORFlb gene, would be seleeted based on known bat eoronavirus sequenees. One of these 
fragments would be the RdRp segment of RaBtCoV/4991^^. Eaeh fragment would be PCR amplified with 
proper overlap regions introdueed in the primers. Finally, all purified fragments would be pooled in 
equimolar eoneentrations and added to the Gibson reaetion mixture, whieh, after a short ineubation, would 
yield the desired ORFlb gene in whole. 

Step 4: Produee the designed viral genome using reverse geneties and reeover live viruses (0.5 month) 

Reverse genetics have been frequently used in assembling whole viral genomes, including eoronavirus 
genomes®^’^^'^®. The most recent example is the reconstruction of the SARS-CoV-2 genome using the 
transformation-assisted recombination inyeasf^. Using this method, the Swiss group assembled the entire 
viral genome and produced live viruses in just one week^^. This efficient technique, which would not leave 
any trace of artificial manipulation in the created viral genome, has been available since 2017^^’^^. In 
addition to the engineered spike gene (from steps I and 2) and the ORFlb gene (from step 3), other 
fragments covering the rest of the genome would be obtained either through RT-PCR amplification from 
the template virus or through DNA synthesis by following a sequence slightly altered from that of the 
template virus. We believe that the latter approach was more likely as it would allow sequence changes 
introduced into the variable regions of less conserved proteins, the process of which could be easily guided 
by multiple sequence alignments. The amino acid sequences of more conserved functions, such as that of 
the E protein, might have been left unchanged. All DNA fragments would then be pooled together and 
transformed into yeast, where the cDNA version of the SARS-CoV-2 genome would be assembled via 
transformation-assisted recombination. Of course, an alternative method of reverse genetics, one of which 
the WIV has successfully used in the pasfi^, could also be employed®^’^^'^^’^*’*’. Although some earlier 
reverse genetics approaches may leave restriction sites at where different fragments would be joined, these 
traces would be hard to detect as the exact site of ligation can be anywhere in the '-SOkb genome. Either 
way, a cDNA version of the viral genome would be obtained from the reverse genetics experiment. 
Subsequently, in vitro transcription using the cDNA as the template would yield the viral RNA genome, 
which upon transfection into Vero E6 cells would allow the production of live viruses bearing all of the 
designed properties. 

Step 5: Optimize the virus for fitness and improve its hACE2-binding affinity in vivo (2.5-3 months) 

Virus recovered from step 4 needs to be further adapted undergoing the classic experiment - serial 
passage in laboratory animals'^f This final step would validate the virus’ fitness and ensure its receptor- 
oriented adaptation toward its intended host, which, according to the analyses above, should be human. 
Importantly, the RBM and the furin-cleavage site, which were introduced into the Spike protein separately, 
would now be optimized together as one functional unit. Among various available animal models (e.g. 
mice, hamsters, ferrets, and monkeys) for coronaviruses, hACE2 transgenic mice (hACE2-mice) should 
be the most proper and convenient choice here. This animal model has been established during the study 
of SARS-CoV and has been available in the Jackson Eaboratory for many years 

The procedure of serial passage is straightforward. Briefly, the selected viral strain from step 4, a 
precursor of SARS-CoV-2, would be intranasally inoculated into a group of anaesthetized hACE2-mice. 
Around 2-3 days post infection, the virus in lungs would usually amplify to a peak titer. The mice would 
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then be saerifieed and the lungs homogenized. Usually, the mouse-lung supernatant, whieh earries the 
highest viral load, would be used to extract the candidate virus for the next round of passage. After 
approximately 10-15 rounds of passage, the hACE2-binding affinity, the infection efficiency, and the 
lethality of the viral strain would be sufficiently enhanced and the viral genome stabilized^*’'. Finally, after 
a series of characterization experiments (e.g. viral kinetics assay, antibodies response assay, symptom 
observation and pathology examination), the final product, SARS-CoV-2, would be obtained, concluding 
the whole creation process. From this point on, this viral pathogen could be amplified (most probably 
using Vero F6 cells) and produced routinely. 

It is noteworthy that, based on the work done on SARS-CoV, the hACF2-mice, although suitable for 
SARS-CoV-2 adaptation, is not a good model to reflect the virus’ transmissibility and associated clinical 
symptoms in humans. We believe that those scientists might not have used a proper animal model (such 
as the golden Syrian hamster) for testing the transmissibility of SARS-CoV-2 before the outbreak of 
COVID-19. If they had done this experiment with a proper animal model, the highly contagious nature of 
SARS-CoV-2 would be extremely evident and consequently SARS-CoV-2 would not have been described 
as “not causing human-to-human transmission” at the start of the outbreak. 

We also speculate that the extensive laboratory-adaptation, which is oriented toward enhanced 
transmissibility and lethality, may have driven the virus too far. As a result, SARS-CoV-2 might have lost 
the capacity to attenuate on both transmissibility and lethality during its current adaptation in the human 
population. This hypothesis is consistent with the lack of apparent attenuation of SARS-CoV-2 so far 
despite its great prevalence and with the observation that a recently emerged, predominant variant only 
shows improved transmissibility'*’^"'*’^. 

Serial passage is a quick and intensive process, where the adaptation of the virus is accelerated. 
Although intended to mimic natural evolution, serial passage is much more limited in both time and scale. 
As a result, less random mutations would be expected in serial passage than in natural evolution. This is 
particularly true for conserved viral proteins, such as the F protein. Critical in viral replication, the E 
protein is a determinant of virulence and engineering of it may render SARS-CoV-2 attenuated'*’^'"’ 
Therefore, at the initial assembly stage, these scientists might have decided to keep the amino acid 
sequence of the E protein unchanged from that of ZC45/ZXC21. Due to the conserved nature of the E 
protein and the limitations of serial passage, no amino acid mutation actually occurred, resulting in a 100% 
sequence identity on the E protein between SARS-CoV-2 and ZC45/ZXC21. The same could have 
happened to the marks of molecular cloning (restriction sites flanking the RBM). Serial passage, which 
should have partially naturalized the SARS-CoV-2 genome, might not have removed all signs of artificial 
manipulation. 


3. Final remarks 

Many questions remain unanswered about the origin of SARS-CoV-2. Prominent virologists have 
implicated in a Nature Medicine letter that laboratory escape, while not being entirely ruled out, was 
unlikely and that no sign of genetic manipulation is present in the SARS-CoV-2 genome"*. However, here 
we show that genetic evidence within the spike gene of SARS-CoV-2 genome (restriction sites flanking 
the RBM', tandem rare codons used at the inserted furin-cleavage site) does exist and suggests that the 
SARS-CoV-2 genome should be a product of genetic manipulation. Furthermore, the proven concepts, 
well-established techniques, and knowledge and expertise are all in place for the convenient creation of 
this novel coronavirus in a short period of time. 
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Motives aside, the following facts about SARS-CoV-2 are well-supported: 

1. If it was a laboratory product, the most critical element in its creation, the backbone/template virus 
(ZC45/ZXC21), is owned by military research laboratories. 

2. The genome sequence of SARS-CoV-2 has likely undergone genetic engineering, through which 
the virus has gained the ability to target humans with enhanced virulence and infectivity. 

3. The characteristics and pathogenic effects of SARS-CoV-2 are unprecedented. The virus is highly 
transmissible, onset-hidden, multi-organ targeting, sequelae-unclear, lethal, and associated with 
various symptoms and complications. 

4. SARS-CoV-2 caused a world-wide pandemic, taking hundreds of thousands of lives and shutting 
down the global economy. It has a destructive power like no other. 

Judging from the evidence that we and others have gathered, we believe that finding the origin of 
SARS-CoV-2 should involve an independent audit of the WIV P4 laboratories and the laboratories of their 
close collaborators. Such an investigation should have taken place long ago and should not be delayed any 
further. 

We also note that in the publication of the chimeric virus SHC015-MA15 in 2015, the attribution of 
funding of Zhengli Shi by the NIAID was initially left out. It was reinstated in the publication in 2016 in 
a corrigendum, perhaps after the meeting in January 2016 to reinstate NIH funding for gain-of-function 
research on viruses. This is an unusual scientific behavior, which needs an explanation for. 

What is not thoroughly described in this report is the various evidence indicating that several 
coronaviruses recently published (RaTG13'^, RmYN02^°, and several pangolin coronaviruses^^'^^’^') are 
highly suspicious and likely fraudulent. These fabrications would serve no purpose other than to deceive 
the scientific community and the general public so that the true identity of SARS-CoV-2 is hidden. 
Although exclusion of details of such evidence does not alter the conclusion of the current report, we do 
believe that these details would provide additional support for our contention that SARS-CoV-2 is a 
laboratory-enhanced virus and a product of gain-of-function research. A follow-up report focusing on such 
additional evidence is now being prepared and will be submitted shortly. 
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